Method for description of audio-visual data content in a multimedia environment

ABSTRACT

For the description of audio-visual data content in a multimedia environment a binary representation of description structures is used. A binary identifier (BID) for each specified descriptor (D) and description scheme (DS) as well as as a binary description definition language (DDL) represent the components.  
     The advantages of a binary representation are a more compact encoding of description structures and thus savings in storage capacity and/or bandwidth, and a faster parsing of the encoded data, especially in the context of dynamically varying descriptions can be achieved.

BACKGROUND OF THE INVENTION

[0001] In the context of the MPEG-7 multimedia description standarddescription structures for the description of audio-visual data contentuse textual representations for descriptors and description schemes,where the number of description elements can be variable. The languageelements of the so-called description definition language (DDL) which isderived from an Extensible Markup Language (XML) are represented also intextual form. The descriptor values (DVs) itself are in a binaryrepresentation.

OBJECTIVE AND ADVANTAGES OF THE INVENTION

[0002] The objective of the present invention is to reduce the requireddata size for storage or transmission of the descriptions and to improvethe searching and browsing speed of the descriptions. This objective isachieved according to claim 1 and the subclaims by a binaryrepresentation of description structures for the description ofaudio-visual data content in multimedia environments. The binarydescription is described using the example of the MPEG-7 multimediadescription standard. While to date the specified description structuresare solely described in a textual form, the invention replaces thetextual representation by a binary representation. The method consistsof several components:

[0003] Binary identifiers for descriptors (D) and description schemes(DS)

[0004] Binary definition of dynamic description schemes (DS)

[0005] Binary definition of dynamic descriptors (D)

[0006] Binary description definition language (DDL)

[0007] Each of these components can either be used separately or inmutual combination in order to achieve the mentioned objectives.

[0008] While the textual representation provides a good readability forhuman beings, it is very inefficient both in terms of required data sizefor the coded description structures as well as for parsing the encodeddata. The presented invention applies a binary representation of thedescription structures that requires significantly less data size forthe coded description structures. Therefore, the storage capacityrequired for storing the encoded data or the bandwidth required for itstransmission can be significantly reduced. Further, the binaryrepresentation decreases the time required for parsing the encoded dataespecially in the context of dynamically varying descriptions. Thisallows the design of much faster browsing and searching procedures forrespective databases.

DRAWINGS

[0009] Exemplifying embodiments of the invention are depicted in thedrawings and will be explained in more detail in the description whichfollows.

[0010]FIG. 1 shows an example for definitions of dynamic descriptionschemes by modification of static description schemes,

[0011]FIG. 2 shows an example for a new defined (dynamic) descriptionscheme using binary DLL.

DESCRIPTION OF THE INVENTION

[0012] Before describing the method of the invention in detail somedefinitions will be presented to better understand the details of theinvention.

[0013] Present solutions use a textual representation of the descriptionstructures for the description of audio-visual data content inmultimedia environments. For this task, a so-called descriptiondefinition language (DDL) is used. which is derived from the ExtensibleMarkup Language (XML) In the context of the remainder of this document,the following definitions are used:

[0014] Data: Data is audio-visual information that will be describedusing MPEG-7, regardless of storage, coding, displays transmission,medium, or technology.

[0015] Feature: A Feature is a distinctive characteristic of the datawhich signifies something to somebody.

[0016] Descriptor (D): A Descriptor is a representation of a Feature. ADescriptor defines the syntax and the semantics of the Featurerepresentation.

[0017] Descriptor Values (DV): A Descriptor Value is an instantiation ofa Descriptor for a given data set (or subset thereof) that describes theactual data.

[0018] Description Scheme (DS): A Description Scheme specifies thestructure and semantics of the relationships between its components,which may be both Descriptors (Ds) and Description Schemes (DSs).

[0019] Description: A Description consists of a DS (structure) and theset of Descriptor Values (instantiations) that describe the Data.

[0020] Coded Description: A Coded Description is a Description that hasbeen encoded to fulfil relevant requirements such as compressionefficiency, error resilience, random access, etc.

[0021] Description Definition Language (DDL): The Description DefinitionLanguage is a language that allows the creation of new DescriptionSchemes and, possibly, Descriptors. It also allows the extension andmodification of existing Description Schemes.

[0022] Static DS: a DS that has been specified from the beginning andthat is contained in a known dictionary of Ds and DSs

[0023] Dynamic DS: a DS that is dynamically defined, using availablestatic Ds and DSs

[0024] The lowest level of the description is a descriptor. It definesone or more features of the data. Together with the respective DVs it isused to actually describe a specific piece of data. The next higherlevel is a description scheme, which contains at least two or morecomponents and their relationships. Components can be either descriptorsor description schemes. The highest level so far is the descriptiondefinition language. It is used for two purposes: first, the textualrepresentations of static descriptors and description schemes arewritten using the DDL. Second, the DDL can also be used to define adynamic DS using static Ds and DSs. Finally, the DDL could possibly alsobe used for defining new Ds. With that respect, also for Ds thedistinction between static and dynamic Ds would have to be made.

[0025] While for the DVs there is already a binary representation syntaxdefined, the Ds and DSs are only represented in textual form. Also, thelanguage elements of the DDL are only represented in textual form. Belowthere is an example for a descriptor called “Dominant color” using thestructural elements from the current DDL specification: <elementname=“DominantColor”> <complexType> <element ref=“ColorSpace”/> <elementref=“ColorQuantization”/> <element name=“DomColorValues”minOccursPar=“DomColorsNumber”> <complexType> <elementname=“ColorValue”> <simpleType base=“Int16” derivedBy=“list”> <lengthParvalue=“DomColorsNumber”/> </simpleType> </element><element  name=“ColorVariance”  minOccurs=“0” maxOccurs=“1”> <simpleTypebase=“unsignedInt1” derivedBy=“list”> <lengthParvalue=“DomColorsNumber”/> </simpleType> </element> <attributename=“Percentage” type=“unsignedInt5”/> </complexType> </element><attribute name=“DomColorsNumber” type=“unsignedInt3”/> <attributename=“VariancePresent” type=“boolean”/> <attributename=“ConfidenceMeasure” type=“unsignedInt5”/> </complexType> </element>

[0026] For the actual instantiation of the descriptor, describing aspecific piece of data, there is already a binary representation syntaxdefined, referred to as “binary Dv data” in the following. However, thecomplete description in binary form will consist of a binary header,followed by the binary DV data and that possibly followed by a binaryfooter.

binary header | binary DV data | binary footer

[0027] The binary header is used to define which D or DS is used todescribe the data, while the binary DV data is used to represent theactual DVs. The binary footer may optionally be used to signal the endof the descriptor.

[0028] According to the invention for all representation of descriptionstructures for the description of audio-visual data content inmultimedia environments a binary representation is used. The presentedprocedure consists of several components that are described in thefollowing sections. The components can either be used separately or inmutual combination.

[0029] Binary Identifiers for Descriptors and Description Schemes

[0030] A binary identifier BID for each specified descriptor D anddescription scheme DS is the key element of the ideas which uniquelyrefers to a catalogue of the available descriptions. Using the BID andthe binary DV representation, an exemplary structure of a D or DS couldlook like follows

D-SC (opt) | D-DYN | BID | binary DV data | D-EC (opt)

[0031] Here, the D-SC (Startcode) and the D-EC (Endcode) can be used asan option in order to allow random access, resynchronisation etc. TheD-DYN (dynamic) defines, in case of DSs up to now only, if the DS isstatic or dynamic. BID is the binary identifier itself, and binary DVdata is the binary representation of the descriptor values. As state ofthe art, D-SC, D-EC, D-DYN and BID are not defined or used. Thestartcode and endcode should be chosen in such a way, that the occurenceof an identical bitstring in the rest of the description is not possibleor at least very unlikely. D-DYN can be represented by 1 bit fixedlength. Finally, for the definition of the BIDs, a unique bitstring isassigned to each D and DS. The assignment can be arbitrary orstructured, as described in the following:

[0032] A) Arbitrary Assingment

[0033] A unique bitstring is assigned to each specified D and DS, whilethe Ds and DSs are chosen arbitrarily. Thus, it is not possible todeduct any information about the type or class of the D or DS from asubset of the bitstring. This is the most easy way of BID definition.The bitstring can either be of fixed length (option 1) or of variablelength (option 2).

[0034] The respective fixed or variable length codes are generatedtaking into account the overall number of Ds and DSs that have to bedescribed. As an option, some additional bitstrings can be defined asreserved in order to use them possibly for future but yet unknownpurposes.

[0035] B) Structured Assignment

[0036] Here, the assignment of bitstrings to Ds and DSs is done in astructured way. This means that the BID can be separated into parts,where each part of the BID has a specific meaning. The structured BIDlooks as follows

D-type (opt) | D-class (opt) | D-subclass (opt) | D-name

[0037] The elements have the following meaning

[0038] D-type: description type, i.e. either description (D) ordescription scheme (DS). Since there are only two possible values, thiselement is represented by 1 bit fixed length.

[0039] D-class: class to which the D/DS belongs, e.g. Audio, Visual,Meta-data etc. This element can either be represented as fixed length oras variable length code.

[0040] D-subclass: the subclass to which the D/DS belongs, e.g. if theclass is Visual, the subclasses could be Colour, Motion, Texture etc. Itis obvious that this element can only exist if the D-class element isalso used. The element can either be represented as fixed length or asvariable length code.

[0041] D-name: the unique name of the D/DS, i.e. the lowest level of thestructure. This element is mandatory, it can either be represented asfixed length or as variable length code.

[0042] Except for D-name, the elements of the structured BID are definedas optional. However, the options are partially mutually exclusive. Thefollowing combinations are allowed:

[0043] D-type | D-class | D-subclass | D-name

[0044] D-type | D-class | D-name

[0045] D-type | D-name

[0046] D-class | D-subclass | D-name

[0047] D-class | D-name

[0048] As can be seen, there exist 5 possibilities to define astructured BID. The sixth possibility would be just to use D-name whichis equivalent to the arbitrary BID described in (A). With the describedprocedure only Ds and static DSs can be described in binary form, but nodynamic DSs. A procedure for the definition of dynamic DSs is describedin the next section.

[0049] Binary Definition of Dynamic Description Schemes

[0050] In order to define a dynamic DS, there exist two possibilities.The first one is to use an arbitrary set of static Ds and/or DSs and putthem together forming a new, dynamic DS. The second possibility is touse a static DS as basis, and to modify it by adding new Da/DSs and/orby removing existing Ds/DSs, as well as to change the number ofoccurences of Ds/DSs with the same BID at a specific level. The firstpossibility can only be realised using the DDL. In the following, aprocedure is described which allows the binary definition of dynamic DSsfor the second possibility, i.e, based on a static DS. The descriptionin general looks as follows:

[0051] D-SC (opt) | D-DYN | BID-D | BID-P | EXT-NUM | EXT-1 | . . . |

[0052] EXT-N |

[0053] ELI-NUM | ELI-1 | . . . | ELI-M | binary DV data | D-EC (opt)

[0054] The elements D-SC, D-DYN, binary DV data and D-EC have alreadybeen described in the previous section. The additional elements have thefollowing meaning:

[0055] BID-D: dynamic BID, i.e. the BID of the dynamic DS to be defined.It can be arbitrarily chosen or can be defined in a structured way asdescribed in the previous section. However, the BID-D is dynamicallygenerated and its full meaning is thus only known to the applicationthat defined it.

[0056] BID-P: parent BID, i.e. the BID of the static DS which serves asa basis for the definition of the DS to be defined. This BID must betaken from the set of available static DSs and its meaning is known toall applications.

[0057] EXT-NUM: number of extensions, i.e. the number of Ds/DSs that areadded to the parent DS.

[0058] EXT-1 . . . EXT-M: description of the extensions; the detailedsyntax is described below.

[0059] ELI-NUM: number of eliminations, i.e. the number of Ds/DSs thatare removed from the parent DS.

[0060] ELI-1 . . . ELI-N: description of the eliminations; the detailedsyntax is described below,

[0061] The elements EXT-NUM and ELI-NUM can be represented by fixed orvariable length code. The extensions EXT-m and the eliminations ELI-nare realised as follows:

[0062] A) Definition of Extensions EXT-m

[0063] The syntax for the extensions is defined as follows:

POS-P | BID | OCC-NUM

[0064] Here, the elements are defined as follows:

[0065] POS-P: the position of the parent node, i.e. the “address” in thebitstring where the extension should be placed. This can be realised intwo ways: first, the bitstream for a description is extended by anelement that defines a unique position, i.e. “D-SC | POS-ID | . . . ”.By this, each D/DS in a description can be uniquely referenced. Second,the address is represented implicitely by either counting the number ofbits from the D-SC of the parent DS until the D/DS to be referenced, orby a 2D address specifying the hierarchy level on which the DS can befound and, within the level, specifying the serial number of the D/DS tobe referenced.

[0066] BID: the BID of the D/DS that is added to the parent DS at POS-P.

[0067] OCC-NUM: number of occurences of the D/DS to be added to theparent node at POS-P.

[0068] B) Definition of Eliminations ELI-n

[0069] The syntax for the eliminations is defined as follows:

POS-ELI | OCC-NUM (opt)

[0070] Here, the elements are defined as follows:

[0071] POS-ELI: the position of the D/DS that is eliminated from theparent DS. The position can again be specified in the two ways describedabove for POS-P.

[0072] OCC-NUM: number of occurences of the D/DS to be removed from theparent node at POS-P. This parameter is optional and is only applied forsuch Ds/DSs that can occur more than once in a static DS.

[0073] An example for applying both kinds of modification for thedefinition of dynamic DSs is given in FIG. 1. In the example, a binarytree representation is used for visualisation of the DSs. In FIG. 1 theparent DS is modified by adding an additional descriptor scheme DS(step 1) and by removing a descriptor D (step 2).

[0074] In the given example, first DS 107 is added in DS 25 at parentnode position 2.1. This includes the static Ds with BIDs 567 and 56,which are known as normative parts of DS 107. The new, dynamic DS isthen assigned by the BID-D 1345:

D-DYN=1 | BID-D=1345 | BID-P=25 | EXT-NUM=1 | POS-P=2.1 | D-type=DS |D-name=107 | OCC-NUM=1 | ELI-NUM=0 | binary DV data

[0075] In the second step, the D at position 3.1 (BID=1256) iseliminated and replaced by a “NULL” node (this still has the sameposition ID, but no content). The advantage of not only eliminating aD/S, but replacing it by a null node, is that the position IDs of theremaining nodes remain unchanged. The modified DS gets ID 1346:

D-DYN=1 | BID-D=1346 | BID-P=25 | EXT-NUM=0 | ELI-NUM=1 |POS-ELI=3.1 |binary DV data

[0076] The first and the second step can also be performedsimultaneously. In the given example, the POS codes for the hierarchicalstructures consist of values <level.number>, where level=1 is the toplevel. Since the number of sub-description elements for each DS isknown, the number index can also be generated automatically. If multipledescription elements with same ID can occur (maxoccurs>1 in DDLexpression), the POS code should be expressed as<level.number.occurence>. Such a procedure has the advantage that POSvalues always remain unchanged as compared to the parent DS, numbers fornew elements start always with max_number+1, where max_number is thehighest number previously existing at a specific level. If a reductionat a higher level occurs, all underlying nodes are also eliminated.

[0077] The dynamically allocated BID-D is not normative, but only forproprietary use.

[0078] Binary Definition of Dynamic Descriptors

[0079] Up to now, only normative D elements could be used. In thefollowing, a binary syntax is specified that enables user-defined Dnodes, which can consist of e.g. private data (carrying descripionelements outside the scope of MPEG-7), or specify the semantics of a Dby a feature extraction method.

D-SC (opt) | D-DYN | BID-D | DATA-TYPE | DATA-LENGTH | D-DATA | binaryDV data | D-EC (opt)

[0080] The elements D-SC, D-DYN, binary DV data and D-EC have alreadybeen described in sections 2.4.1 and 2.4.2. The additional elements havethe following meaning:

[0081] DATA-TYPE: type of data that defines the dynamic descriptor.Here, e.g. private data or the semantic of a feature extraction methodcan be contained.

[0082] DATA-LENGTH: length of the data that defines the descriptor.

[0083] DATA: the actual data that defines the descriptor.

[0084] The elements can be realised using either fixed length orvariable length codes.

[0085] Binary Description Definition Language

[0086] This is intended as a one-to-one mapping of descriptionstructures defined in XML/DDL (or similar languages) from normative,pre-specified sub-elements. The purpose is to express DSs which arebuilt from scratch in an efficient way by using the “catalogue” ofunique binary identifiers. The syntax to define new DSs with the binaryDDL is as follows:

BID-D | NOB | BID1 | B-FLAG | NOB-BID1(opt) | BID11 | B-FLAG | NOB-BID11(opt) | BID12 | B-FLAG | NOB-BID12 (opt) . . . BID1N | B-FLAG |NOB-BID1N (opt) | BID2 | B-FLAG | NOB-BID2 (opt) . . . BIDN | B-FLAG |NOB-BIDN (opt)

[0087] Here, the elements are defined as follows:

[0088] BID-D: the temporary BID of the dynamically defined DS (dynamicBID).

[0089] B-FLAG: indicates, if “1”, that the preceeding DS has one or morechildren Ds or DSs, the number of which is specified in the followingelement. If “0” it indicates that the DS has no further children. Theelement B-FLAG is only used if the preceeding BID describes a DS; incase of the BID describing a D, the element is not necessary, since Dscan have no further children.

[0090] NOB: number of branches, i.e. the number of children Ds or DSsthe dynamic DS shows in a binary tree representation at the first level.

[0091] BID1: the BID of the first static D or DS (from the uniquenormative catalogue) in the first level.

[0092] NOB-BID1: the static DS with BID1 can have several children Ds orDSs, the number of which is specified here.

[0093] BID11: the BID of the first static D or DS in the second levelwith respect to the DS with BID1.

[0094] NOB-BID11: the static DS BID11 can have several children, thenumber of which is specified here

[0095] BID12: the BID of the second static D or DS in the second levelwith respect to the DS with BID1.

[0096] NOB-BID12: the static DS BID12 can have several children, thenumber of which is specified here.

[0097] BID1N: the BID of the Nth static D or DS in the second level withrespect to the DS with BID1.

[0098] NOB-BID1N: the static DS BID1N can have several children, thenumber of which is specified here.

[0099] BID2: the BID of the second static D or DS (from the uniquenormative catalogue) in the first level.

[0100] OB-BID2: the static DS with BID2 can have several children Ds orDSs, the number of which is specified here.

[0101] BIDN: the BID of the Nth static D or DS (from the uniquenormative catalogue) in the first level.

[0102] NOB-BID2: the static DS with BIDN can have several children Ds orDSs, the number of which is specified here.

[0103] The defined elements can either be represented using a fixedlength or a variable length code. Since the syntax given above is anoptional recursive one, it shall be described using the example shown inFIG. 2.

[0104] Using the above defined binary syntax, the example can bedescribed as follows:

BID-D=25 | NOB-2 | BID1=107 | B-FLAG=1 | NOB-BID1=3 | BID11=1256 |BID12=1 | BID13=107 | B-FLAG=1 | NOB-BID13=2 | BID131=567 | NOB-BID131=0| BID132=56 | BID2=2567|BID-FLAG=0

[0105] The syntax can also be described using a different representationas shown below. Here, each line corresponds to one element of the binaryrepresentation, where for better understanding, the correspondingpositions from FIG. 2 and the type of the description (D or DS) areincluded as comments (“/*”) BID-D=25   /* POS=1.1, DS NOB=2  BID1=107 /*POS=2.1, DS  BID-FLAG=1  NOB-BID1=3 BID11=1256 /* POS=3.1, D BID12=1 /*POS=3.2, D BID13=107 /* POS=3.3, DS BID-FLAG=1 NOB-BID13=2 BID=567 /*POS=4.1, D BID=56   /* POS=4.2, D  BID2=2567 /* POS=2.2, DS  BID-FLAG=0

1. Method for a description of audio-visual data content in a multimedia environment comprising: a binary identifier (BID) for each specified descriptor (D) and description scheme (DS) contained in a known dictionary, which descriptor (D) is a representation of a feature for a distinctive characteristic of data and which description scheme (DS) specifies the structure and semantics of the relationships between its components, being descriptors (D) and/or description schemes (DS), a binary description definition language (DDL).
 2. Method according to claim 1 comprising that for the multimedia environment the MPEG-7 multimedia description standard is used and that a hierarchical description is used with the lowest level of description being a descriptor (D), the next higher lever being the description scheme (DS) and a further higher level being the description definition language (DDL).
 3. Method according to claims 1 or 2 further comprising a unique bitstring assigned to each specified descriptor (D) and description scheme (DS).
 4. Method according to claim 3 further comprising that the bitstring is of fixed or variable length.
 5. Method according to claims 1 or 2 further comprising that the binary identifiers (BID) are structured in several parts, where each part has a specific meaning such as type, class, subclass or name.
 6. Method according to one of claims 1 to 5 comprising dynamic description schemes, where an aritrary set of static descriptors (D) and/or description schemes (DS) are put together forming a dynamic description scheme (DS).
 7. Method according to one of claims 1 to 5 comprising dynamic description schemes by using a static descriptor (D) as a basis and to modify this descriptor by adding new descriptor—or description scheme information and/or by removing existing descriptor—or description scheme information.
 8. Method according to one of claims 1 to 7 comprising the position of parent nodes (POS-P) defining an address in the bitstring where an extension is placed.
 9. Method according to claim 8 comprising an element that defines a unique position for an extension.
 10. Method according to claim 8 comprising that the address is represented implicitely by either counting the number of bits from a startcode (D-SC) of the parent description scheme (DS) until the descriptor (D) or description scheme (DS) to be referenced or by a two dimensional address specifying the hierarchical level on which the description scheme (DS) can be found and within the level, specifying the serial number of the descriptor or the description scheme to be referenced.
 11. Method according to one of claims 9 or 10 comprising that a position element (POS-ELI) is provided, marking the position of the descriptor (D) or description scheme (DS) to be eliminated from a parent description scheme (DS).
 12. Method according to claim 11 comprising that in case of an elimination the bitstring in this position is replaced by a null node, so that position identifications of remaining nodes remain unchanged.
 13. Method according to one of claims 1 to 12 comprising that user defined descriptor nodes are provided, carrying description elements especially outside the scope of MPEG-7.
 14. Method according to one of claims 1 to 13 comprising that the binary description language (DDL) is provided to define new description schemes (DS) and especially describing the number of branches (NOB . . . ) on different description levels. 